Predicting Head Pose from Speech with a Conditional Variational Autoencoder

نویسندگان

David Greenwood

Stephen D. Laycock

Iain Matthews

چکیده

Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we, as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally cooccurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Several previous authors have shown that prediction is possible, but experiments are typically confined to rigidly produced dialogue. Natural, expressive, emotive and prosodic speech exhibit motion patterns that are far more difficult to predict with considerable variation in expected head pose. Recently, Long Short Term Memory (LSTM) networks have become an important tool for modelling speech and natural language tasks. We employ Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language, to model the relationship that speech has with rigid head motion. We then extend our model by conditioning with prior motion. Finally, we introduce a generative head motion model, conditioned on audio features using a Conditional Variational Autoencoder (CVAE). Each approach mitigates the problems of the one to many mapping that a speech to head pose model must accommodate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not...

متن کامل

Conditional Variational Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT

The purpose of a Network Intrusion Detection System is to detect intrusive, malicious activities or policy violations in a host or host's network. In current networks, such systems are becoming more important as the number and variety of attacks increase along with the volume and sensitiveness of the information exchanged. This is of particular interest to Internet of Things networks, where an ...

متن کامل

An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders

In a given scene, humans can often easily predict a set of immediate future events that might happen. However, generalized pixellevel anticipation in computer vision systems is difficult because machine learning struggles with the ambiguity inherent in predicting the future. In this paper, we focus on predicting the dense trajectory of pixels in a scene — what will move in the scene, where it w...

متن کامل

Shape optimization in laminar flow with a label-guided variational autoencoder

Computational design optimization in fluid dynamics usually requires to solve non-linear partial differential equations numerically. In this work, we explore a Bayesian optimization approach to minimize an object’s drag coefficient in laminar flow based on predicting drag directly from the object shape. Jointly training an architecture combining a variational autoencoder mapping shapes to laten...

متن کامل

CDVAE: Co-embedding Deep Variational Auto Encoder for Conditional Variational Generation

Problems such as predicting an optical flow field (Y ) for an image (X) are ambiguous: many very distinct solutions are good. Representing this ambiguity requires building a conditional model P (Y |X) of the prediction, conditioned on the image. It is hard because training data usually does not contain many different flow fields for the same image. As a result, we need different images to share...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Predicting Head Pose from Speech with a Conditional Variational Autoencoder

نویسندگان

چکیده

منابع مشابه

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

Conditional Variational Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT

An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders

Shape optimization in laminar flow with a label-guided variational autoencoder

CDVAE: Co-embedding Deep Variational Auto Encoder for Conditional Variational Generation

عنوان ژورنال:

اشتراک گذاری